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10 for arithmetic reasoning and general science. Findings suggest 
that, while linear factor analysis overestimated the number of 
underlying dimensions, the other three methods correctly confirmed 
unidimensionality but differed in their ability to detect a lack of 
dimensionality. Stout's procedure showed excellent power in detecting 
a lack of unidimensionality. Holland and Rosenbaum* s procedure and 
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Assessing Dimensionality of a Set of Items — Comparison of Different Approaches 

Abstract 

This study examines the performance of the following four methodologies for 
assessing uni dimensionality: Stout's procedure, Holland and Rosenbaum's approach, linear 
factor analysis, and nonlinear factor analysis. Each method is examined and compared with 
other methods on simulated test data and real test data. Seven data sets were simulated: 
three unidimensional test data, and four two-dimensional test data, all with 2000 
examinees. Two levels of correlation between abilities were considered (p=3 and p=.7). 
Eight different real test data were used: four of them are known uri dimensional test data, 
and the other four were two— dimensional test data created from unidimensional tests. 
Findings suggest that, while the linear factor analysis overestimated the number of 
underlying dimensions, the other three methods correctly confirmed unidimensionality but 
differed in their ability to detect lack of unidimensionality. Stout's procedure showed 
excellent power in detecting lack of unidimensionality; Holland and Rosenbaum's and 
nonlinear factor analysis approaches showed good power provided the correlation between 
abilities is low. 



. t 

2 



It is well known that most item response theory models require the assumption of 
unidimensio iality. According to Lord and Novick (1968), dimensionality is defined as the 
total number of abilities required to satisfy the assumption of local independence. If there 
is only one ability affecting the responses of a set of items to meet the assumption of local 
independence then that set is referred to as a unidimensional set. It is also been long argued 
that test items are multiply determined (Humphreys, 1981, 1985, 1986; Hambleton & 
Swaminathan, 1985; Reckase, 1979, 1985; Stout, 1987; Traub, 1983; Yen, 1985) and several 
abilities unique to items or common to relatively few items are inevitable. The abiUty 
which the test is intended to measure (i.e., the ability common to all items) will be referred 
to as the dominant ability and abilities unique to or influencing few items will be referred 
to as minor abilities. Given that tests are multiply determined, it is intuitively clear that 
in order to satisfy the assumption of unidimensionaUty it is required that a given test 
measures a single dominant ability. A number of simulation studies have demonstrated 
that dominant ability can be recovered well, using computer programs such as LOGIST, in 
tests with one dominant factor in the presence of several minor factors (Reckase, 1979; 
Drasgow & Parsons, 1983; Harrison, 1986). Although counting only dominant dimensions 
violates Lord and Novick's (1968) definition of dimensionality, it is commonly accepted 
that in order to apply unidimensional item response theory models it is sufficient to show 
that there is one dominant ability underlying the responses to a set of items. 

Stout (1987, 1990) provided a mathematically rigorous definition of dominant 
dimensionality referred to as essential dimensinn^y, and provided a statistical test to 
assess essential unidimensionaUty of a set of items. Essential dimensionality is the total 
number of abilities required to satisfy the assumption of essential independence. Essential 
independence and essential dimensionality are the weaker forms of local independence and 
traditional dimensionality (Lord k Novick, 1968), respectively. Stout's definition jf 
essential dimensionality uses an infinite item pool item response theory framework wherein 
the item pool is conceptualized as the consequence of continuing the test construction 
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process in the same manner beyond the construction of the & items of the finite test being 
studied. Hence essential dimensionality is defined for the item pool. 

In assessing essential unidimensionality using Stout's procedure, one is assessing the 
likelihood that the given set of items comes from an essentially uiii dimensional item pool. 
The major focus in assessing essential unidimensionality of a given set of item responses is 
to determine how minor the influence of minor abilities is and whether the influence of the 
minor abilities can be ignored in assessing essential unidimensionality. 

Historically speaking, linear factor analysis has been used to assess the 
dimensionality of the latent space underlying a set of items. If the results indicate a 
one— factor solution then it can be inferred that one dominant ability is influencing item 
responses. There are, however, a number of technical as well as methodological problems 
associated with using linear factor analyses to assess dimensionality. For example, 
difficulty level of items and guessing level of multiple choice items can each play a major 
role in altering the factor structure of item responses resulting in an overestimation of the 
number of underlying factors (for details see Carroll, 1945, Hulin, Drasgow, k Parsons, 
1983, Zwick, 1987). Consequently, many attempts have been made by researchers in recent 
years to develop new methods to assess dimensionality. Some of the recently developed 
methods include nonlinear factor analysis (McDonald k Ahlawat, 1974); Bejar's procedure 
(Pejar, 1980); order analysis (Wise, 1981); modified parallel analysis (Hulin, Drasgow, k 
Parsons, 1983); residual analysis (Hambleton k Swaminathan, 1985); Bock's full 
information factor analysis (Bock, Gibbons, k Murake, 1985); Holland and Rosenbaum's 
test of unidimensionality, monotonicity, and conditional independence (Rosenbaum, 1984; 
Holland k Rosenbaum, 1986); Humphreys and Tucker's procedures (Tucker, Humphreys, 
k Roznowski, 1986); and Stout's unidimensionality procedure (Stout, 1987). 

Hattie (1985), Hambleton and Rovinelli (1986), and Berger and Knol (1990) have 
reviewed several procedures for assessing dimensionality including some of the above 
mentioned procedures. Their conclusions were that none of the procedures were 
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satisfactory. The main focus of this paper is to study and compare some of the procedures 
to assess dimensionality that are most recent, seem promising, and are little studied. Four 
procedures are considered and compared in this paper: Nonlinear factor analysis, Holland 
and Rosenbaum's procedure, Stout's procedure, and linear factor analysis. Linear factor 
analysis is used, because of its historical importance, as a benchmark to compare other 
procedures. Several unidimensional and multidimensional test data are simulated and used 
to study the performance of all four procedures for assessing dimensionality. The same 
procedures are then repeated with real test data. 



Description of Procedures 
Linear Factor Analysis 



Linear factor analysis is the most commonly used approach to assess dimensionality 
With linear factor analysis, each extracted factor is presumed to represent a dimension or 
trait and the items that load heavily on a given factor are considered good measures of that 
dimension. There are a number of fundamental problems associated with applying linear 
factor analysis to binary data. First, the linear factor analysis assumes that the relationship 
between the observed variables and the underlying factors is linear and that the variables 
are continuous in nature. But it can be shown that the relationship between the 
performance and the underlying latent variable is nonlinear. Hence applying factor analysis 
to binary responses amounts to approximating the nonlinear relationship to a linear one. 
As a result, difficulty factors are produced if guessing is allowed, irrespective of whether 
phi or tetrachoric correlations are used (Hulin, Drasgow, ic Parsons, 1983). Secondly, in 
computing tetrachoric correlations, the cell entries of the fourfold table for a pair of 
dichotomous items frequently become zero thus making it difficult to determine an 
appropriate value for the correlation. Thirdly, problems associated with determining the 
number of significant factors exist. 
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In this study the statistical package LISCOMP is used to perform exploratory linear 
factor analysis using tetrachoric correlations. Three different approaches are used to 
determine the number of significant factors: parallel analysis, the chi-square test of 
goodness of fit, and goodness of fit statistics (the means and standard deviations of the 
squares of residual correlations and absolute residuals). 

According to parallel analysis (Humphreys k Montenelli, 1975) the eigenvalues of 
the given correlation matrix are compared with the eigenvalues of the random data. The 
random data consists of binary responses generated randomly with the same number of 
items and examinees as that of the given data. The largest eigenvalue from the random 
data is used as the cutoff point for eigenvalues from the actual data to determine the 
number of significant factors. That is, the number of eigenvalues of the actual data greater 
than the largest eigenvalue of the random data is taken as the significant number of factors 
underlying the given data. 

The second method used to determine the number of factors is the chi-square test 
of goodness of fit. The third method involves comparison of means and standard deviations 
of squares of residuals and absolute values of residuals after fit of an m-factor model with 
the corresponding values from the random data. If the residuals are sufficiently sjnall, then 
one can regard the fit of the model as reasonably satisfa^nry (McDonald, 1981; Hattie, 
1985, Hambleton k Rovinelli, 1986; and Berger k Knol, 1990). 

Nonlinear Factor Analysis 

McDonald (1967, 1980, 1982), McDonald and Ahlawat (1974) have demonstrated 
that applying linear factor analysis to unidimensional binary data yields "nonlinear 
factors" rather than "difficulty factors". McDonald developed the method of nonlinear 
factor analysis (NLFA) to account for the nonlinearity of the data as an improvement over 
linear factor analysis. In the context of item response theory, nonlinear factor analysis 
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seems appropriate because the latent variable is related to the performance in a nonlinear 
fashion. The variables in the model can be expressed as polynomial functions of latent 
traits or factors. For example, a two-factor model with linear and quadratic terms would 
be of the following form: 

Y i = b i0 +b ill tf l +b il2 fl ? +b i21 tf 2 +b i22 tf 2 +u i e i' ( i=1 ' 2 >- -N) 

where Yj denotes the examinee's score on item i, 0 denotes the latent trait, denotes the 
factor loadings of i-th item on j-th common factor for k-th degree element in the 
polynomial, and u { denotes the unique factor loading for item i. Conceptually, NLFA is 
very appealing and seems appropriate to assess the dimensionality of binary responses 
conforming to normal ogive or logistic item response models. Hambleton and Rovinelli 
(1986) have demonstrated the use of NLFA to assess dimensionality and found it to be a 
promising method. They, however, caution about the criterion for the adequacy of the fit of 
the model. 

In the present study NLFA embodied in the computer program NOFA is used. The 
fit of the model is studied just as in the case of the linear factor analyses comparing the 
means and standard deviations of squared residuals and absolute residuals with the 
corresponding values of random data and linear factor analyses. The chi-square statistic 
values are not available and hence were not used. 



Holland and Rosenbaum's Test of Lack of Fit of a 
Unidimensional, Monotone, and Conditional Independent Model 

Rosenbaum (1984), and Holland and Rosenbaum (1986) have proved theorems 
concerning conditional association that can be applied to assess dimensionality. The basic 
notion in Holland and Rosenbaum's (H&R) theorems is that if the items are locally 
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independent, unidimensional, and the ICCs are monotone, then the items are conditionally 
positively associated. Specifically, the conditional covariances between any pair of item 
response functions of a set of uni dimensional dichotomous item responses given any 
function of the remaining item responses will be nonnegative. This can be hypothesized as 

EL: Cov (X., X.| S Xj > 0 vs. H,: Cov (X., XA I X v ) < 0 
U 1 J i,j*k l i j . j#k it 

Conditional associations for each pair of items is tested, given the number-right 
score on the remaining items. The Mantel-Haenszel test (M-H) (Mantel it Haenszel, 1954) 
is used to test this hypothesis. To perform the M-H test on a given pair of items, a 2x2 
contingency table is constructed for the pair for each of the possible number-right score on 
the remaining items. The M— H statistic is given by: 

2 -U+ - E (-n+) + '/ 2 

where n^ denotes the observed number of examinees with total score of k answering 
both items i and j correctly with k = 1,2,. ..K. Efn^^) and v ( n n + ) are the expectation 
and variance of n^ + respectively where the plus subscript denotes the summation over k. 
The computed Z— value is referred to the lower tail of the standard normal distribution. A 
statistically significant Z implies that the pair of items in question are not conditionally 
associated given the sum of the other items, thus inconsistent with the unidimensional 
model. In this manner the M— H statistic is computed for all N(N-l)/2 pairs of items. If a 
lar ge number of pairs are shown not to be conditionally associated, then the unidimensional 
assumption is inappropriate. 

Since H&R approach tests each item pair with significance level a, the simultaneous 
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inference for all item pairs can be based on Bonferoni bounds (Holland & Rosenbaum, 1986, 
Junker, 1990, and Zwick, 1987). According to Bonferoni bounds one would accept H, if the 
number of rejections at level a is around na, where n is the number of tests performed; 
reject H. if at least one test is rejected at level a/n. 

Rosenbaum (1984), Zwick (1987), and Ben-Simon and Cohen (1990) have 
demonstrated the application of H&R approach to assess dimensionality. Ben-Simon and 
Cohen found the H&R approach to be conservative and erroneously misclassified nearly 
half of the multidimensional item pools they analyzed as unidimensional. Zwick found 
H&R approach to be consistent with other procedures investigated in confirming 
unidimensionality of NAEP reading data. 



Stout's Procedure 



Stout (1987) developed a statistical procedure to test the hypothesis of essential 
unidimensionality, the existence of one dominant dimension. The procedure has several 
steps. These are briefly described here (for details see Stout, 1987, Nandakumar, 1991). The 
hypothesis is stated as 



H Q :d E =l vs. H i: d E >l 

where d £ denotes the essential dimensionality of the item pool in which the given test 
responses are assumed to be imbedded. The J examinees are partitioned into two groups. 
One group of examinees is used for exploratory factor analysis to select items for subtests, 
and the other group of examinees is used to compute Stout's statistic T. The N test items 
are split into three subsets ATI, AT2, and PT. The items of subtest ATI are chosen such 
that they all measure the same dominant ability; the items of AT2 are matched in 
difficulty with items of ATI to correct for difficulty and guessing factors in item responses; 
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and the rest of the items are used for PT. The subtest PT is used to split examinees into K 
subgroups based on their PT score; the subtests ATI and AT2 are used to compute the 
unidimensional statistic T given by: 

T = (T r T 2 )/J7, 

where 




is computed using items of AT-. The and ^ and are given as follows. 
The usual variance estimate for subgroup k is given by 

*k 2 = S j=l ( Y j k) - ^ k) > 2 / J k, 

where 

with U-j^ (1 or 0) denoting the response for item i by examinee j in subgroup k, and 
denoting the total number of examinees in subgroup k. The "unidimensional" variance 
estimate for subgroup k is given by 

; 2 _ v m ;(k) n :(k) W w2 

where 

p( k ) = E Jk . U. v/J,. 
m j=l ijk' k 

And the standard error of estimate for subgroup k is given by 
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s k =[(; 4ik -;j)+i 4k /M 4 ] 1/2 /j k , 

where 

'4* = P| k) (l-P| k) ) ( W k) ) 2 

The computed T value is referred to the upper tail of the standard normal 
distribution to obtain the significance level. The p-values of unidimensional tests are 
expected to be large while the p-values of multidimensional tests are expected to be within 
the margin of the specified level of significance. 

Stout's procedure, as refined by Nandakumar and Stout (1991), is used for assessing 
dimensionality in the present study. Stout's procedure has been found to be discriminating 
well between unidimensionai and two-dimensional tests in a variety of simulated test data 
for correlation between abilities as high as .7 (Stout, 1987; Nandakumar & stout, 1991). 
Nandakumar (1991) has shown the usefulness of Stout's procedure to assess essential 
unidimensionality in the possible presence of several minor abilities. Nandakumar(1989) 
applied Stout's procedure on several real test data sets and found that the procedure 
correctly confirmed the unidimensionality of test data that were previously shown to be 
unidimensional by others. For two-dimensional test data, created by combining the 
unidimensional test data, Stout's procedure exhibited good power. 

Description of Test Data 
The Simulated Test Data 

Seven data sets DATA1-DATA7 are simulated. Out of the seven, three data sets 
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DATA 1— DAT A3 are strictly unidimensional tests consisting of 25, 40, and 50 items 
respectively. The other four data sets DATA4-DATA7 are each two-dimensional with 
length N=25 and correlation between abilities />=.3, N=25 and /?=.7, N=50 and p=.3, and 
N=50 and p=.7 respectively. AU data sets have 2000 examinees. These test data are 
described in Table 1. The unidimensional test data are generated according to the 
three-parameter logistic model. The abilities are independently generated from the 
standard normal distribution and the item parameters (a^b^) of real tests as described in 
Nandakumar (1991) «xe used in generating item responses. For example, items of DATA 1 
have a larger variability in discrimination power ranging from 1.22 to 2.82; items of 
DATA 2 have a smaller variability of a^ ranging from 1.07 to 2.00. For each simulated 
examinee, the probability of correctly answering each item ? { (8) was computed using the 
three-parameter logistic model. For each item i, a random number between 0 and 1 was 
generated from a pseudo-uniform distribution. If the computed probability P { (6) is greater 
than or equal to the random number generated, the examinee was said to have answered 
the item correctly and was given a scor* of 1; otherwise the examinee was given a score of 
0. The two-dimensional test data were generated according to the multidimensional 
compensatory model (Reckase k McKinley, 1983). The abilities i = ($ $ ) were generated 
from a bivariate normal distribution with both means zero, and both variances one. The 
correlation coefficient between the abilities varied appropriately. The pseudo guessing level 
was taken to be .20 for all tests. The discrimination parameters (a^a^) for each item were 
generated as follows: 
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where /i and a are the mean and standard deviation of the distribution of discrimination 
parameters of the respective unidimensional tests with the same number of items. Similarly 



13 



1 



12 



9 

ERIC 



bj. and were assumed to be independent of each other for each item and were generated 
as follows: 



b li~ N (/*> °)> b^* N(/i, a), 

where \l and <r are the mean and standard deviation of the distribution of difficulty 
parameters of the respective unidimensional test with the same number of items. For 
example to generate test data DATA4 with N=25 and p=.3, the means and standard 
deviations of ajS and b.s of item parameters used for DATAl were used. The item responses 
(0,1) were generated exactly as described for unidimensional case by using P^g) of a 
two— dimensional compensatory model. 

The Real Test Data 

The real test data used in this study came from two different sources. The National 
Assessment of Educational Progress (NAEP, 1988) data for tests US History (HIST) and 
Literature (LIT) for grade 11/age 17 were obtained from Educational Testing Services. The 
Armed Services Vocational Aptitude Battery (ASVAB) data for Arithmetic Reasoning 
(AR) and General Science (GS) for grade 10 were obtained from Linn, Hastings, Hu, and 
Ryan (1987). The details about these data sets are described in Table 1. Since all the four 
test data sets are considered to be unidimensional, they were combined to form pseudo 
two-dimensional tests (Zwick, 1987; Nandakumar, 1989). Four two-dimensional tests were 
formed as follows. The test data HSTLIT1 was formed by combining the data of 31 items 
of HIST with the data of 5 items of LIT randomly selected from 30 items. Since HIST 
contains more examinees than LIT, excess examinees in HIST are randomly deleted in 
order to make the lengths of the data sets equal. Similarly the data on 10 items from LIT is 
combined with the data on 31 items of HIST to form HSTLIT2; and the data on 10 items 
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from GS are combined with tht data on 30 items of AR to form ARGS. These three are 
pseudo two-dimensional test data because it is not known if the same examinees took both 
tests. Hence the correlation between the abilities is considered to be zero. The last 
two-dimensional test HSTGEO consisting of 36 items differs from other two-dimensional 
tests in that the same examinees took both sets of items. HSTGEO contains 31 history 
items spanning the US history from colonization period to modern times (HIST) and in 
addition contains 5 map items requiring the knowledge of geographical location of different 
countries in the world. This is the actual history test according to NAEP. It was shown 
using Stout's procedure that the 5 map items formed a separate dimension (Nandakumar, 
1989). Hence the data on these 5 map items were removed from the history test to form 
HIST with 31 items and the original history is treated as a natural (as opposed to pseudo) 
two-dimensional test (HSTGEO). 



Results 



The results of Stout's procedure and the H&R approach will be studied together and 
compared because of the similarity in the underlying theory and because both of them are 
statistical tests. Likewise the results of linear and nonlinear factor analysis will be studied 
and compared together. 

The Simulated Test Data 

Stout's and HfcR Proce^u r^ 

The results of Stout's procedure and the H&R approach for simulated data are 
presented at the top of Table 2. For all test data the p-values associated with Stout's 
procedure indicate that Stout's procedure is able to correctly confirm unidimensionality 
and detect lack of unidimensionality for both correlation (between abilities) levels p=.3 and 
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p=.7. For example, all three unidimensionai test data DATA1-DATA3 have large 
p-values implying the acceptance of the null hypothesis of essential unidimensionality 
(here the tests are strictly unidimensionai). For two-dimensional data, on the other hand, 
the associated p-values are very small, strongly rejecting the null hypothesis of essential 
unidimensionality. 

The results of the H&R approach indicate that for unidimensionai tests 
DATAi~DATA3, the number of significant negative partial associations at level o (a=.05) 
are far below the expected number (na), strongly confirming the unidimensionai nature of 
test data. Among the two-dimensional tests, DATA4 and DATA6 (for both />=.3) were 
correctly assessed as multidimensional. For DATA4 and DATA6 the number of significant 
negative partial associations at level a were beyond na level, and the number of significant 
negative partial associations beyond level a/n were 15 and 1 respectively, making them 
multidimensional. The test data DATA5 and DATA7 (for both p-.7) t on the other hand, 
are assessed as unidimensionai. For DATA5 and DATA7 the number of significant negative 
partial associations at level a are within na level, and the number of significant negative 
partial associations beyond level a/n is zero, hence making them unidimensionai tests. It 
was disappointing to note that for many of the item pairs measuring different traits, in 
two-dimensional tests, the covariance did not approach significance. One reason for this 
could be the noise in the conditional score. More research is necessary to draw definite 
conclusions. 

Linear and Nonlinear Factor Analysis 

The computer programs used to do the analyses, LISCOMP and NOFA are heavily 
computationally intensive and consume enormous CPU time. In addition, LISCOMP 
program can not handle more than about 40 variables. For these reasons only a selection of 
simulated data sets were included in the linear factor analyses but all test data were 
included in the nonlinear factor analyses. The results of linear and nonlinear factor analysis 
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are presented in Table 3. 

Based on parallel analyses, one factor would be retained for DATA1, DATA2, and 
DATA5; two factors would be retained for DATA4. Whereas according to the p-values 
associated with a chi-square test of goodness of fit, in Table 3, a two-factor model fits 
DATA1, beyond four-factor model fits DATA2 and DA.TA4, and a three-factor model fits 
DATA5. Similar chi-square values are not available for nonlinear models and hence are not 
reported. 

The goodness of fit statistics, the means and standard deviations of squared 
residuals and absolute residuals, are reported for all test data in Table 3. The top entry in 
Table 3 refers to random data (RANDOM) with 25 variables and 2000 examinees. Because 
of the cost of computations, only one random data is used to compare the goodness of fit 
statistics. Comparison of goodness of fit statistics of RANDOM with DATA1, it appears 
that one-factor quadratic model fits the data better than four-factor linear model. Hence 
nonlinear model accurately confirms the unidimensional nature of items. The one-factor 
cubic model is no better than the one-factor quadratic model. Similar observation can be 
made for DATA2. Comparison of goodness of fit statistics for linear and nonlinear factor 
analysis, it can be seen that for DATA4 and DATA5, two-factor quadratic model fits 
better than three-factor linear nodel, confinning two-dimensional nature of data. As 
expected, the means and standard deviations of squared residuals and absolute residuals is 
much larger for DATA4 (p=.3) than for DATA5 (/>=.7), reflecting more 
multidimensionality. For DATA5, although two-factor quadratic model fits better than 
one-factor quadratic model, the difference in goodness of fit statistics is so small that one 
is tempted to accept one-factor quadratic model. Likewise two-factor quadratic model fits 
better than one-factor quadratic for DATA6 and one-factor quadratic model fits DATA7. 

In summary, the linear factor analysis either underestimates or overestimates the 
number of factors and hence is not adequate for assessing dimensionality. The other three 
procedures are excellent in confirming unidimensionality. Stout's procedure has 
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demonstrated greater power in detecting multidimensionality for correlation between 
abilities as high as .7. H&R and nonlinear factor analysis methods have demonstrated good 
power provided the correlation between abilities is low. 



The Real Test Data 
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StOUt'S and HfcR ProraHnrP 

The results of Stout's and H&R for real test data are presented at the bottom of 
Table 2. For all test data the p-values associated with Stout's procedure indicate that 
Stout's procedure is able to correctly confirm unidimensionality and detect lack of 
unidimensionality in cases where a test data is contaminated by as few as 15% of the data 
from a second dominant dimension (for example, HSTLITl). 

For LIT data, the p-value associated with Stout's procedure is in the border line 
tending towards acceptance of H. . The p-values associated with HIST, AR, and GS are 
large leading to acceptance of H. . Relatively small p-values for LIT and AR suggest that 
there is some multidimensionality present in these test data. For all two-dimensional tests, 
the associated p-values are very small strongly confirming multidimensional nature of 
these data. This is true both for correlated abilities (HSTGEO) and for uncorrelated 
abilities (HSTLITl, HSTLIT2, ARGS). The p-value for HSTLITl is larger than for 
HSTLIT2 suggesting greater degree of multidimensionality. 

The results of H&R approach is consistent with Stout's procedure in assessing 
unidimensionality. Whereas for two-dimensional tests, the H&R approach does not seem 
to exhibit good power, while test data HST, and AR were clearly confirmed as 
unidimensional, for test data LIT the decision is not clear. Although the number of 
significant negative partial associations for LIT are less than the maximum allowed 
(na=22), one of the M-H tests was found to be significant beyond a/n level suggesting 
significant presence of multidimensionality in the data. For two-dimensional tests 
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HSTLITl, ARGS, and HSTGEO, the number of significant negative partial associations is 
far below the n a level suggesting unidimensional nature of these data. For HSTLIT2, 
however, the number of significant negative partial associations is well above na level 
suggesting presence of multidimensionality but none of the M-H tests were significant 
beyond level a/n to conform multidimensionality. Hence the decision about dimensionality 
is not clear although one is tempted towards multidimensionality. 

On closer examination it was found that the M-H z-values for many of the item 
pairs where items were supposed to be measuring different traits were negative but not 
statistically significant. One explanation for this could be that for these item pairs the 
conditional score (EX k ), on the basis of which the examinees are classified into different 
groups is confounded by noise. This is especially true for HSTLIT2 and ARGS where one 
quarter of the test items are of second dominant dimension. Because of the noise in the 
conditional score distribution the covariance of item pairs measuring different abilities may 
not be exhibiting significant negative covariance. Proper conditional score could 
considerably increase the power of the H&R approach. 

Linear an d Nonlinear Factor Analysis 

The results of linear and nonlinear factor analysis for a selection of tests are 
reported in Table 4. The results are consistent with the simulated test data in that for all 
cases nonlinear factor models fit more accurately than linear factor models. According to 
the chi-square test of goodness of fit, beyond four-factor model fits all test data where 
linear factor analysis is performed. Based on goodness of fit statistics, one factor quadratic 
model fits the test data LIT, AR, and HSTLIT better than three- or four-factor linear 
model. Also one-factor quadratic model fits as well as a two-factoi quadratic model. In 
the interest of parsimony therefore, one-factor quadratic is the right choice. For HSTLIT2 
and ARGS two-factor quadratic fits better than one-factor quadratic, and three-factor 
quadratic is no better than two-factor quadratic. But the distinction in the fit statistics 
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between one-factor quadratic and two-factor quadratic is not clear. If chi-«quare statistics 
were available along with the goodness of fit statistics, it would have aided in the 
interpretation. 

In summary, for real test data, the results are somewhat consistent with simulated 
test data. Linear factor analysis over estimates the number of underlying dimensions and is 
not adequate for assessing the fit of the model. Whereas the other three methodologies are 
excellent in assessing unidimensionality but differed in assessing lack of unidimensionality. 
Stout procedure has demonstrated greater power than either the H&R or the nonlinear 
factor analysis methods. With the appropriate conditional score the power of H&R 
approach could be improved; and with some type of fit statistics the power of nonlinear 
factor analysis could be improved. 

Discussion 

Based on this limited study, findings demonstrate that the linear factor analysis 
approach to assess dimensionality is not adequate. This finding is consistent with the 
previous research (see for example, Hambleton k RovinelH (1986); Hattie, 1984). In 
contrast to linear factor .analysis, Stout's, H&R, and nonlinear factor analysis were each 
shown to be promising methodologies to assess dimensionality. The findings should be 
interpreted with caution, in that a limitation of this study was the feature of creation of 
two-dimensional real test data (except, HSTGEO). The item responses combined from two 
different tests were not administered to the same group of examinees. The results may have 
been slightly different had the same examinees taken both sets of items. 

In this study all three methodologies exhibited sensitivity to discriminate between 
one- and two-dimensional test data. For known unidimensional test data, both simulated 
or real data, all three procedures were able to confirm unidimensionality. For 
two-dimensional tests, however, the three procedures differed in their ability to detect the 
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lack of unidimensionality. Stout's procedure rejected the null hypothesis of essential 
unidimensionality for all two-dimensional tests, both real and simulated tests. The H&R 
approach confirmed the lack of unidimensionality for two-dimensional simulated tests 
provided the correlation between abilities was low (p=.3). For simulated test data with 
high correlation between abilities (p=.7) the H&R approach was unable to detect 
multidimensionality. In addition, for all two-dimensional real test data, the H&R 
approach was unable to detect multidimensionality. The performance of nonlinear factor 
analysis methodology was similar to EkR procedure for two-dimensional tests. For 
simulated test data with p=.3, the two-factor model with linear and quadratic terms 
demonstrated adequate fit statistics (smaller means and standard deviations of squared 
residuals and absolute residuals). For simulated tests with p=.7, however, the distinction 
between fit statistics between one-factor and two-factor quadratic models was not evident. 
Similarly for two-dimensional real test data HSTLIT2 and ARGS, the difference in fit 
statistics between one-factor and two-factor models with linear and quadratic terms was 
not evident. The difficulty in deciding about the correct model arises because there is no 
concrete way of assessing what is meant by sufficiently small for goodness of fit statistics. 

In this study the results associated with the H&R approach were consistent with the 
findings of the Ben-Simon and Cohen's (1990) and Zwick's (1987) studies. The number of 
significant negative partial associations for unidimensional tests were far below the 
expected five percent level, making it a very conservative test. Consequently it did not 
exhibit high power. Accorang to the theorems proved by Holland and Rosenbaum (1986), 
the conditional score used to compute the covariances can be aay function of the latent 
trait. An appropriate choice of conditional score therefore could maximize the power of 
H&R approach. 

The results of nonlinear factor analyses were consistent with the findings of 
Hambleton and Rovinelli (1986). Factor models with linear and quadratic terms were able 
to fit the data better than models with just linear terms. The problem with nonlinear 
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factor analysis is the appropriate number of polynomial terms to retain in the model. This 
suggests that some type of adequacy of fit statistics with associated p-values would be 
necessary to aid in assessing the fit of nonlinear models. 

In terms of assessing the degree of multidimensionality, both Stout's and nonlinear 
factor analysis approaches can be useful. The p— values associated with Stout's procedure 
and the fit statistics associated with nonlinear factor analysis can be helpful in assessing 
the degree of multidimensionality. For example, both HIST and AR are unidimensional 
tests but the associated p-values are .937 and .118 respectively. By contrast for a 
two-dimensional test HSTMT2, p=.000. The difference in the p-values mirror the degree 
of multidimensionality present in the data. Similarly, the difference in fit statistics between 
one-factor and two-factor quadratic models for DATAl and DATA4 reflect the degree of 
multidimensionality. 

Just as linear and nonlinear methodologies share the same philosophical theory, 
Stout's and H&R approaches share the same theoretical framework. The basic rationale for 
the H&R approach is to reject the locally independent, monotone, unidimensional model if 
the conditional covariances are significantly negative. By contrast, Stout's procedure 
rejects the essentially independent, monotone, essentially unidimensional model if the 
conditional covariances are significantly positive (it can be shown that the expected value 
of the numerator of Stout's statistic T is mathematically equivalent to average conditional 
covariances among ATI items, Stout (1987)). This apparent contradiction in the criterion 
for assessing unidimensionality may be resolved by noting the subtle difference in item pair 
covariances under consideration. In the H&R approach one expects the conditional 
co variance between items measuring different traits to be negative; whereas in Stout's 
approach one expects the asymptotic conditional covariance between items measuring the 
same trait to approach zero. Stout's procedure is specifically designed to assess 
unidimensionality and hence looks for the existence of at least two dominant dimensions. 
By contrast, the H&R approach looks at all item pairs and detects items that are not 
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measuring the same trait as other items of the test. 

As for the computational time involved, Stout's procedure is most efficient. The 
computational time involved for other procedures is significantly more. For example, for a 
25 item test with 2000 examinees, Stout's procedure uses 4 seconds of CPU time, H&R 
approach uses 24 seconds, and nonlinear factor analysis uses 42 seconds; for a 50 items test 
with 2000 examinees, Stout's procedure uses 8 seconds, H&R approach uses 106 seconds, 
and nonlinear factor analysis uses 191 seconds. As the test length increases, H&R approach 
requires disproportionately more time, and the same is true for nonlinear factor analysis as 
test length increases and/or the model gets more complex. 
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Table 1 
Description of Data Sets 



BUMfaaC of it-eae of each trait 

t *** **** 

Naae J Traits p N Traitl Trait2 Nixed 



Simulated test data 



DATA1 


2000 


1 




25 


25 


0 


0 


DATA2 


2000 


1 




40 


40 


0 


0 


DATA3 


2000 


1 




50 


50 


0 


0 


DATA4 


2000 


2 


.3 


25 


8 


8 


9 


DATA 5 


2000 


2 


.7 


25 


8 


8 


9 


DATA6 


2000 


2 


.3 


50 


16 


16 


17 


DATA 7 


2000 


2 


.7 


50 


16 


16 


17 


Real test data 














LIT 


2380 


1 




30 


30 


0 


0 


HIST 


2425 


1 




31 


31 


0 


0 


Ait 


1984 


1 




30 


30 


0 


0 


GS 


1990 


1 




25 


25 


0 


0 


HSTLIT1 


2380 


2 


0 


36 


31 


5 


0 


HSTLIT2 


2380 


2 


0 


41 


31 


10 


0 


ARGS 


1984 


2 


0 


40 


30 


10 


0 


HSTGEO 


2425 


? 


7 


36 


31 


5 


0 



J denotes the nunber of examinees 

p denotes the correlation between traits 

N denotes the test length 
**** 

mixed items are a combination of both traits 1 and 2 



** 
*** 
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Table 2 

Results of Stout and H&R Analyses 





Stout *s Teat 




H&R Teat 




H«: d^l 




H«: Cov(X.,X.| U X.)>0 


Naae 


Decision 
based on 
Stout' 8 
T p< procedure 


No. of 
itea 
pairs 
n 


No. of No. of Decision 
pairs pairs based on 
significant significant Bonferoni 
at level a at level */n bounds 



Simulated teat data 



DATAl 


-1.05 


.85 


accept H« 


300 


1 


DATA2 


-0.75 


.77 


accept 


780 


3 


DATA 3 


-0.94 


.83 


accept 


1225 


10 


DATA4 


7.19 


.000 


reject 


300 


71 


DATA 5 


3.62 


.000 


reject 


300 


10 


DATA6 


10.13 


.000 


reject 


1225 


206 


DATA 7 


2.41 


.008 


reject 


1225 


56 



0 

0 

15 

0 

1 

0 



accept Hm 

accept 
accept 
reject 
accept 
reject 
accept 



Real test data 



LIT 


1.70 


.045 


accept 


435 


16 


HIST 


-1.53 


.937 


accept 


465 


6 


AR 


1.18 


.118 


accept 


435 


3 


GS 


-0.14 


.555 


accept 


300 


6 


HSTLIT1 


2.75 


.003 


reject 


630 


18 


HSTLIT2 


8.9 


.000 


reject 


820 


83 


ARCS 


8.34 


.000 


reject 


780 


37 


HSTGEO 


6.83 


.000 


reject 


630 


16 



1 

0 
0 
0 
0 
0 
0 
0 



undecided 

accept 

accept 

accept 

accept 

undecided 

accept 

accept 



* 

significant at .05 level 



Table 3 

Results of Linear and Nonlinear Factor Analysis 
For Simulated Test data: Goodness of Fit Statistics 



rij 2 " SD(rij 2 ) "[rUT SD( | ri j | ) p< 



RANDOM 

Linear Factor Analysis 

1 Factor 

2 Factor 

3 Factor 

4 Factor 

DATA1 

Linear Factor Analysis 

1 Factor 

2 Factor 

3 Factor 

4 Factor 

Nonlinear Factor Analysis 
1 Factor Quadratic 



<V b iO tb il* +b i2 SJ+b i3 e i> 
1 Factor Cubic 

<V b iO tb il" tb i^ tb i3* Stb i4 e i> 



.0009 


.0308 


.0250 


.0182 




0008 




t U^ZO 


• uioy 




.0007 


.0246 


.0207 


.0160 




.0006 










.0017 


.0412 


.0333 


.0242 


.006 


.0013 


.0359 


.0286 


.0218 


.350 


.0011 


.0332 


.0262 


.0204 


.610 


.0009 


.0303 


.0236 


.0191 


.860 


.0003 


.0185 


.0147 


.0113 




.0003 


.0185 


.0147 


.0113 





DATA2 

Linear Factor Analysis 

1 Factor 

2 Factor 

3 Factor 

4 Factor 

Nonlinear Factor Analysis 
1 Factor Quadratic 



<V b iO tb il* tb i2 enb i3 e i> 
1 Factor Cubic 

<V b iO +b U* +b i2^ tb i3 eStb 1 4 e i> 



.0110 


.1049 


.0982 


.0369 


.000 


.0091 


.0954 


.0896 


.0327 


.000 


.0070 


.0834 


.0774 


.0310 


.000 


.0061 


.0779 


.0720 


.0278 


.000 


.0003 


.0186 


.0148 


.0113 




.0003 


.0185 


.0148 


.0113 





DATA3 



Nonlinear Factor Analysis 

1 Factor Quadratic .0003 .0186 .0147 .0115 

(Y.= b i0 .b u « + b i2 *M> i3 V 

1 Factor Cubic .0003 .0175 .0138 .0108 

<V b i0 tb il* +b i 2 * 2tb i3*' tb i4*i» 
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Table 3 continued. . . 
DATA4 

Linear Factor Analysis 

1 Factor .0203 .1425 .1108 .0900 .000 

2 Factor .0017 .0412 .0334 .0240 .000 

3 Factor .0012 .0346 .0276 .0212 .008 
Nonlinear Factor Analysis 

1 Factor Quadratic .0021 .0465 .0523 .0379 

(V b i0* b H* +b i2^ b i3 e i> 

2 Factor Quadratic .0003 .0171 .0131 .0109 

<V b i0 +b illV b il 2 S i tb i2lV b i22 <, 2 tb i3 e i> 
DATA5 

Linear Factor Analysis 

J Fac tor .0047 .0686 .0556 .0409 .000 

J Fac tor .0014 .0374 .0313 .0218 .011 

3 Factor .0012 .0346 .0289 .0199 .245 

4 Factor .0010 .0316 .0254 .0181 .600 
Nonlinear Factor Analysis 

1 Factor Quadratic .0009 .03C7 .0246 .0186 

2 Factor Quadratic .0003 .0174 .0138 .0107 

<V b i0 +b illV b n2^ b i2lV b i22*2 tb i3 e i» 
DATA6 

Nonlinear Factor Analysis 

1 Factor Quadratic .0005 .0242 .0204 .0172 

<V b l0 *b il *b. 2 *nb. 3 . i ) 

2 Factor Quadratic .0003 .0182 .0145 .0111 

<V b i0 tb illV b il2*l +b i2lV b i22*2 tb i3 e i> 
DATA7 

Nonlinear Factor Analysis 

1 Factor Quadratic .0005 .0223 .0176 .0137 

<V » i0 +b il'V b i3 e i> 

2 Factor Quadratic .0003 .0175 .0140 .0105 
<V b i0 tb illV b il2*? +b i2lV b i2 2 *2 +b i3 e i> 



* 

r ij are tne re8 ^ ua l correlations 
** 

p-value associated with the chi-square test of goodness of fit. 
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Table 4 

Results of Linear and Nonlinear Factor Analysis 
For Real Test data: Goodness of Fit Statistics 



rij 2 * SD(rij 2 ) "fTijJ SD(|rij|)p< 



LIT 

Linear Factor Analysif 

1 Factor 

2 Factor 

3 Factor 

4 Factor 
Nonlinear Factor Analysis 

1 Factor Quadratic 

(Y.= b i0 *V*b l2 »M> i3 e.> 

2 Factor Quadratic 

<V b iO +b ill^l +b il2^ +b i2lV b i22^ +b i3 e i) 
AR 

Linear Factor Analysis 

1 Factor 

2 Factor 

3 Factor 

4 Factor 
Nonlinear Factor Analysis 

1 Factor Quadratic 

(Y.= b 10 tb u *b l2 *'«, i3 ..) 

2 Factor Quadratic 

<V b i0 +b mV b n2*! +b l2lV b i22*2* b 13 e i > 
H3TLIT1 

Linear Factor Analysis 

1 Factor 

2 Factor 

3 Factor 

4 Factor 
Nonlinear Factor Analysis 

1 Factor Quadratic 

<V b i0+ b u *b 12 *' + b i3V 

2 Factor Quadratic 



.0034 


.0584 


.0465 


.0354 


.000 


.0028 


.0526 


.0428 


.0307 


.000 


.0019 


.0439 


.0349 


.0267 


.000 


.0015 


.0391 


.0310 


.0240 


.000 


.0008 


.0278 


.0216 


.0176 




.0004 


.0207 


.0162 


.0130 





.0047 


.0683 


.0569 


.0378 


.000 


.0032 


.0561 


.0468 


.0310 


.000 


.0024 


.0489 


.0400 


.0281 


.000 


.0020 


.0447 


.0362 


.0262 


.000 


.0007 


.0265 


.0200 


.0174 




.0004 


.0190 


.0146 


.0122 





.0053 


.0729 


.0574 


.0450 


.000 


.0043 


.0657 


.0545 


.0363 


.000 


.0033 


.0578 


.0457 


.0354 


.000 


.0022 


.0469 


.0380 


.0279 


.000 


.0009 


.0298 


.0213 


.0209 




.0004 


.0204 


.0157 


.0129 





9 

ERIC 



31 



4 



Table 4 continued. . . 
HSTLIT2 

Nonlinear Factor Analysis 

1 Factor Quadratic .0013 .0358 .0228 .0276 

<V b i0 +b il' +b i2* 2 * b i3 e i> 

2 Factor Quadratic .0003 .0182 .0140 .0117 

( V b i0 +b ilA +b il2^ +b i21 <? 2 +b i22 <? 2 +b i23V 2 +b i3 e i ) 
ARGS 

Nonlinear Factor Analysis 

1 Factor Quadratic .0011 .0335 .0239 .0235 

<V b i0 +b il <?+b i2 <?2+b i3 e i ) 

2 Factor Quadratic .0003 .0184 .0143 .0117 

<V b i0 +b illV b il2*l* b i2lV b i22^ b i23*lV b i3 e i> 

3 Factor Quadratic .0003 .0175 .0136 .0111 

( V b i0 +b illV b n2^ b i2lV b i22*2 tb i3lV 
b i32 9 3* b i33 9 lV b i34 <> lV b i35W b i4'i ) 

r ii are residual correlations 



** 



p-value associated with the chi-square test of goodness of fit. 
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